160 ◾ Bioinformatics
-operation g \
-nastring . \
-vcfinpu
The all-variant annotations of SARS-CoV-2 are shown as in Figure 4.22.
4.5 SUMMARY
The high-throughput sequencing makes variant discovery much easier than the use of
traditional methods like microarrays. Raw data obtained from sequencing technology is
used for the detection of variants including base substitutions, insertions, deletions, and
structural variants. Variants can be in any region of the genome; however, only variants
that affect functions of the genes are studied. The consequences of variants depend on the
affected regions and they may be deleterious implicating in healthy conditions and disease
like cancers or may lead to the appearance of a new strain in bacteria and viruses that
is more infectious and lethal like the recent variants of SARS-CoV2 or more antibiotic-
resistant strain of bacteria. This why the variant discovery using sequencing data gained
importance and it is widely used in genetics, medical diagnosis, and drug discovery.
Sequencing depth, paired-end sequencing, and the use of long reads make variant
detection more accurate and allow detection of large-scale variants like structural vari-
ants, insertions, and deletions.
The variant calling pipelines use SAM/BAM files of whole genome, whole transcrip-
tome, or targeted gene sequences to discover the bases in the samples that are different
from the bases on the same locations on the reference genome. Variant calling programs
use two approaches for variant calling. The first approach is used by bcftools and it is based
on consensus sequence which is formed by collapsing the piled-up aligned reads. The sec-
ond approach is used by the recent variant callers like GATK. This approach is based on
haplotypes of the variants that are more likely to be inherited together. GATK 4 is the most
commonly used program for variant calling. It uses an advanced workflow pipeline called
GATK best practice pipeline which leads to the detection of accurate variants.
After variant identification with a variant calling program and filtering, variants can
be annotated by assigning functional information to variants using annotation programs.
FIGURE 4.22 All-variant annotation file of SARS-COV-2.